50 research outputs found
Optimal rates and adaptation in the single-index model using aggregation
We want to recover the regression function in the single-index model. Using
an aggregation algorithm with local polynomial estimators, we answer in
particular to the second part of Question~2 from Stone (1982) on the optimal
convergence rate. The procedure constructed here has strong adaptation
properties: it adapts both to the smoothness of the link function and to the
unknown index. Moreover, the procedure locally adapts to the distribution of
the design. We propose new upper bounds for the local polynomial estimator
(which are results of independent interest) that allows a fairly general
design. The behavior of this algorithm is studied through numerical
simulations. In particular, we show empirically that it improves strongly over
empirical risk minimization.Comment: 36 page
Nonparametric regression with martingale increment errors
We consider the problem of adaptive estimation of the regression function in
a framework where we replace ergodicity assumptions (such as independence or
mixing) by another structural assumption on the model. Namely, we propose
adaptive upper bounds for kernel estimators with data-driven bandwidth
(Lepski's selection rule) in a regression model where the noise is an increment
of martingale. It includes, as very particular cases, the usual i.i.d.
regression and auto-regressive models. The cornerstone tool for this study is a
new result for self-normalized martingales, called ``stability'', which is of
independent interest. In a first part, we only use the martingale increment
structure of the noise. We give an adaptive upper bound using a random rate,
that involves the occupation time near the estimation point. Thanks to this
approach, the theoretical study of the statistical procedure is disconnected
from usual ergodicity properties like mixing. Then, in a second part, we make a
link with the usual minimax theory of deterministic rates. Under a beta-mixing
assumption on the covariates process, we prove that the random rate considered
in the first part is equivalent, with large probability, to a deterministic
rate which is the usual minimax adaptive one
Robust Methods for High-Dimensional Linear Learning
We propose statistically robust and computationally efficient linear learning
methods in the high-dimensional batch setting, where the number of features
may exceed the sample size . We employ, in a generic learning setting, two
algorithms depending on whether the considered loss function is
gradient-Lipschitz or not. Then, we instantiate our framework on several
applications including vanilla sparse, group-sparse and low-rank matrix
recovery. This leads, for each application, to efficient and robust learning
algorithms, that reach near-optimal estimation rates under heavy-tailed
distributions and the presence of outliers. For vanilla -sparsity, we are
able to reach the rate under heavy-tails and -corruption,
at a computational cost comparable to that of non-robust analogs. We provide an
efficient implementation of our algorithms in an open-source
library called , by means of which we carry out numerical
experiments which confirm our theoretical findings together with a comparison
to other recent approaches proposed in the literature.Comment: accepted versio